Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Shotgun Metagenomic Data Analysis ◾ 311

allow more accurate taxonomic group assignment. There are several programs for assem-

bly-free classification and profiling of microbial communities in metagenomic samples.

Kaiju [3] uses taxonomy and NCBI refseq databases to find maximum matches to the reads

on the protein-level using the Burrows–Wheeler transform (BWT). CLARK [4] (CLAssifier

based on Reduced K-mers) creates a large index of k-mers of all target sequences and then

it removes the common ones among targets so that each target is described by unique

k-mers, which are used for taxonomic classification. Kraken [5] creates k-mers from the

reads and then it builds taxonomy trees that help discriminate closely related microbes

using classification tree and path. Those programs are just examples and there are others

with different algorithms. Centrifuge [6] is a rapid classifier that requires a little memory

and a relatively smaller index (only 5.8 GB for bacterial, viral, and human genomes) on

desktop computers compared to others. Centrifuge uses an indexing system that is based

on BWT and the Ferragina–Manzini (FM) index.

Most taxonomy classifiers of the metagenomic data use genomic database of known spe-

cies to construct an index and then use that index to assign taxa to the metagenomic reads.

The majority of the classifiers require a large storage space for database files and a large

memory for indexing and classification process. Kaiju and Kraken require a lot of memory

(around 128GB–512GB). Therefore, we recommend using these classifiers only if you have

enough computational resources. To use any of these classifiers, you need to download and

build an index and then to perform the classification.

Kaiju installation instructions are available at “https://github.com/bioinformatics-cen-

tre/kaiju”. You can install it by running the following command:

git clone https://github.com/bioinformatics-centre/kaiju.git

cd kaiju/src

make

Then, you need to add its path by adding the following to the “.bashrc” file. You need to

replace YOUR_PATH with the program path.

export PATH=”YOUR_PATH/kaiju/bin”:$PATH

You must restart the terminal or use “source ~/.bashrc” to make the change active. Run

“kaiju” command to check if it has been installed.

Before using kaiju, you need to download the refseq database from the NCBI or you

can download it from the kaiju website at “https://kaiju.binf.ku.dk/server”. To download it

from the NCBI database, use the following:

mkdir kaijudb

cd kaijudb

kaiju-makedb -s refseq

The download will take a long time and a large storage space. When the database has

been downloaded, make sure that “nodes.dmp”, “kaiju_db_refseq.fmi”, and “names.

dmp” files are present in the “kaijudb” directory. You may need to decompress